Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

7.x islandora 2046: Allow Islandora Batch to be extended for more fun #105

Open
wants to merge 3 commits into
base: 7.x
Choose a base branch
from

Conversation

DiegoPino
Copy link

@DiegoPino DiegoPino commented Sep 4, 2018

JIRA Ticket: (https://jira.duraspace.org/browse/ISLANDORA-2046)

What does this Pull Request do?

Allows Islandora Batch to be used for other more repository actions (like delete or update) but keeps "ingest" as default to allow any other extending class of IslandoraBatchObject to keep working as before. Also changes islandora_batch_queueprimary index to a compound one, to allow same Islandora installation to keep same PID in different sets, but, still enforces uniqueness inside a single SET.

What's new?

If you code a little, your Islandora Batch could be able to update, edit, drop, purge, clone or QA your objects. Up to you. If you don't code, all is still the same in Islandora paradise ! 🌴
But: this opens a wide open world to reusing our batch queue for post-processing, editing, etc operations, including working on existing objects using pieces from newly created in-batch ones.
Also: this is a first step, probably more can be done and will be done in the future.

How should this be tested?

Please make sure that your existing Batch Sets and New ingests still work. Not much more?

Enjoy/fear the flexibility.

Islandora IMI uses this patch already for its update capabilities. As you see implementation is quite simple.

Additional Notes:

  • Does this change require documentation to be updated? No
  • Does this change add any new dependencies? No
  • Does this change require any other modifications to be made to the repository (ie. Regeneration activity, etc.)? No
  • Could this change impact execution of existing code? Nope.

Interested Parties

@Islandora/7-x-1-x-committers @adam-vessey @jordandukart @jonathangreen

…ions than just pure ingests (#1)

* First steps on abstract the repository action

* Using a get Method for action description

Suggested verb in past tense

* Remove writing back ingested object

To match qadan:7.x-ISLANDORA-2055

* Change queue primary Key

This will allow to have same PID in multiple different sets, just
restricting the same PID to exist multiple times in a single set

* Check if Object exists before calling ingest

This should have been there forever. Check if object is in place before
trying to ingest

* Combined Primary Key

This allows for same PID to exist in different Batch Sets. Quite useful
for so many reasons

* DCS
@DiegoPino DiegoPino changed the title 7.x islandora 2046: Allow Islandora Batch to be extended for more act… 7.x islandora 2046: Allow Islandora Batch to be extended for more fun Sep 4, 2018
}
}
else {
$context['message'] = t('%pid not ready for ingest.', array('%pid' => $ingest_object->id));
$context['message'] = t('%pid not ready for ingest.', array('%pid' => $islandora_batch_object->id));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, but should ingest not be replaced by the selected action (if so, perhaps re-wording to '%pid cannot be %action: object not ready.' could help with the grammar...)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, totally. Only issue i had there is verb conjugation/my-lack of wording. I don't know how to convert that phrase to "simple past", since "ingested" is what i was using/suggesting. I could, instead of returning a single "action" description, maybe return an array, with present/past verbs. Ideas? Or other way, change wording of other past-dependant sentences to present?
Also, there are many other mentions (like function names) to ingest, i tried to go for a simple solution for now that allowed the use case to avoid too much refactoring. Reason: every-time i have done big-time refactoring everyone ends using my branch and patch, but i don't get merged into core, too much of a commitment!

Totally open to suggestions

Copy link
Author

@DiegoPino DiegoPino Sep 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe..
'Successfully %action %pid'which leads to "successfully ingested islandora:monster"
could be reworded as
'We managed to successfully ingest islandora:monster' ? or Islandora managed to, or Islandora Batch, etc..
or "Your Ingest of islandora:monster was successful"
What do you think @d-r-p ? That would a single present tense verb to be used to describe the action itself

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I think the more compact the message is, the better. On the other hand, leaving past tenses lying around is somehow awkward (by the way, I guess it would be a good idea to wrap whatever we decide on with t()) - let alone present + past tenses. If you do not like my earlier suggestion of reformulating that one message into past tense, how about returning t("Ingest") and alerting the user with '%action action for %pid succeeded.', '%action action for %pid failed.', etc... (or similar)?

Regarding the naming of functions (and files?), perhaps a little bit of refactoring would be good. I think, otherwise, this might cause confusion if picked up after some time. But feel free (you, or anybody) to disagree...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I missed your earlier suggestion! (sorry), yeah, sounds fine to me and i will go for that, thanks. t() and watchdog don't play well... since we are kinda accumulating the messages. I remember having quite some issues with that in the past so will stay away from that now. Also, i always forget our classes are really not generic php classes, they are used inside the weirdness of drupal, so t() can be used...

About naming of functions and files. Refactor can bring fear and riots as said before. Will wait a bit and if someone else feels it is needed right now can do of course. I would need to deprecate some functions and maybe too much. Lets discuss this. This was meant to actually be eventually merged!

@DiegoPino
Copy link
Author

@d-r-p i did an extensive review of te how deep the notion of a single PID inside the whole islandora batch system DB is hard coded and it is quite extensive/my head hurts. Its not complicated at all, coding around is quite simple, but still the PID (id key in the each table) is used to bind messages and states to the queue.There is an autoincrement 'bid' key in the main queue table that i plan to use to change that, adding it as the new binding one. Basically the queue was designed to handle single PID always, even the DB helper functions are made around that idea. I don't fully like the table normalization here or the use of foreign keys on the same table to accommodate the parent/child relation in batch, but i understand why this was done in the first place and also why multiple instances of the same PID was not a need.

I wonder if you are familiar with how this works at all (sorry for bothering you) but thinking about doing some pretty heavy alters to make 'bid' the key that binds all tables instead of the 'id' aka PID, but keeping PID anyway all around to make queries faster. That will also require to do an update hook that updates the existing values in the DB but again afraid of getting stuck in no committers interaction or merging. @adam-vessey @jordandukart what do you think? Is that change (would be not breaking) something you guys would, if not support, at least consider as feasible? Other @Islandora/7-x-1-x-committers still interested in 7.x?

@DiegoPino
Copy link
Author

DiegoPino commented Sep 6, 2018

FYI: before going further, here is an idea on how the altered schema would look like
https://gist.github.com/DiegoPino/e7819c6f67e50b7a4193e43b2744e1f3
I decided to go simply adding also a 'sid' to the other tables, specially since when dealing with parent relationships there is no way i can know the bid of the parent before it is added to the database and that could happen after the child is added if weirdly coded.
Also noted that foreign keys are not used at all by Drupal 7, they are just for documentation purposes but still we were using "fields" and the document (not saying it is right one, just what drupal's docs and other modules do) is "columns". Thoughts?

@bryjbrown
Copy link
Member

@DiegoPino I just tested this on a VM with three example batch processes:

  • Reapplying scholar embargoes
  • Zip importing
  • DOI importing

In all three cases this behaved as expected. I'm not sure how to test the new behavior, but I can confirm that this at least doesn't affect existing batch processes.

I'm not super familiar with how batch processing works in Islandora, but the idea of generalizing it to do more than just ingests seems like a great idea, and your code looks good to me in terms of removing hardcoded references to ingestion and replacing them with variables to describe the action being applied. I'd feel more comfortable if someone more familiar with the batching in Islandora gave this a look, but its a solid 👍 from me :)

* Recomended value is a verb in past tense.
*/
public function getRepositoryActionDescription() {
return "ingested";

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The manner in which this is implemented/used will lead to messages which are not quite appropriately translatable... as is, may lead to incorrect grammar in some languages.

... Would have to allow each implementation to provide their own complete messages (passing the arguments to receive a full response)? Or just the individual template strings?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would that mean generating full messages for success, exceptions and errors? Issue i see with that is that adding then afterwards arguments coming from the actual batch would make things quite complicated. What do you propose

Copy link

@d-r-p d-r-p Sep 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adam-vessey @DiegoPino Perhaps something along the lines of what I suggested earlier ('%action action for %pid succeeded.', etc...) might be easier to translate, while allowing us to keep batch arguments? [EDIT: Then the above function would return "Ingest".]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As indicated... allow implementations to either provide complete messages or template strings.

Or a particular implementation approach? Kind of leaning towards the template things... Possibly one of:

  • a method passed a constant indicating the type of message, returning the template string, into which we then substitute our few things using format_string(); or,
  • a method returning a mapping keyed by such a constant we then select the message from and similarly throw it at format_string()

... there are a number of ways it could be accomplished.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, will work on the template strings idea, probably 3 ones would suffice. Wonder in that case if the full generation t() et all // - hate having t()in a class...- + argument expansion/replacement should happen an implementing class or should stay on batch itself as it is today? ideas @d-r-p @adam-vessey @bryjbrown ? The simpler the better i would say because the more i add the more i will be open other wormholes .. the implementation would be aware that batch itself only provides the %pid really...and some Drupal exception message...

@@ -8,9 +8,10 @@
/**
* Batch interface.
*
* Implementing classes should subclass some version of FedoraObject, so
* Implementing classes should subclass some version of FedoraObject, so.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize this was there previously, but... this isn't quite a full sentence, here... Really... not sure the intent of this comment... just seems to reinforce the extends below... could just remove the comment?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally. I was more like waiting for someone to complete the sentence! but yeah, can remove

@@ -45,7 +45,7 @@ function islandora_batch_schema() {
'not null' => FALSE,
),
),
'primary key' => array('id'),
'primary key' => array('id', 'sid'),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not... sure this is valid, given all the joins which happen... everything thing that refers to this as a foreign key would be broken, such as the state, resource and message tracking (the islandora_batch_state, islandora_batch_resources and islandora_batch_queue_messages, respectively) are all associated with the unique entry in the queue... it would now be sharing things across sets, which probably isn't desirable... state especially... and might be desirable to reword some of the things: instead of "per-object" messages, "per-queue entry", kind of thing...

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"broken" might be the wrong word... more like... consistent in unexpected ways, when navigating the GUI?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adam-vessey look at
https://gist.github.com/DiegoPino/e7819c6f67e50b7a4193e43b2744e1f3 and let me know what do you think there. That also implies changing/adding a few db methods to deal with the fact that we wont be able to trust that result of operations will give is always a single queue element for a given PID

In any case Foreign keys will/won't be broken at all and without any extra stuff functionality stays the same with this change. Drupal makes no use of Foreign keys (see them as annotation properties for future work that won't happen anymore). Best example is that our foreign keys are already wrongly defined (see how the gist changes that), we were not using 'columns' key and still nothing ever broke.

The way i see this is that this primary key is more an internal optimization to A) disregard the integrity table conflict/error and B) make search in the table faster.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At a glance, the gist looks fine (a diff/something highlighting the changes would be more readable), but it's not so much the schema itself I'm concerned with, 'cause yeah, foreign keys are just descriptive as you point out; however, they're nice to keep up-to-date for to facilitate these types of discussions...

What I'm more concerned with is more the things like the various state checks:

... The checks on parents as well would likely have to change, since they can no longer be the single columns check... probably safe to constrain to the same set:

->where('c.parent = p.id')

... There would be some views adjustments also in order, to constrain things to items in the given set:

  • 'islandora_batch_queue' => array(
    'left_field' => 'id',
    'field' => 'parent',
    ),
  • protected function getLinks($values) {
    return array(
    array(
    'title' => t('Set item state'),
    'href' => format_string('!base/!item_id/set_state', array(
    '!base' => $this->actionBasePath(),
    '!item_id' => $this->getItemId($values),
    )),
    'query' => drupal_get_destination(),
    ),
    array(
    'title' => t('Delete item'),
    'href' => format_string('!base/!item_id/delete', array(
    '!base' => $this->actionBasePath(),
    '!item_id' => $this->getItemId($values),
    )),
    'query' => drupal_get_destination(),
    ),
    );
    }

... along with adjusting the menu routes used by views to accept the set:

  • $items['islandora_batch/item/%/delete'] = array(
    'title' => 'Delete "@item" from queue',
    'title arguments' => array('@item' => 2),
    'page callback' => 'islandora_batch_delete_item_page_callback',
    'page arguments' => array(2),
    'access callback' => 'islandora_batch_item_access',
    'access arguments' => array(2),
    'file' => 'includes/menu.inc',
    );
    $items['islandora_batch/item/%/set_state'] = array(
    'title' => 'Set "@item" state',
    'title arguments' => array('@item' => 2),
    'page callback' => 'islandora_batch_set_item_state_page_callback',
    'page arguments' => array(2),
    'access callback' => 'islandora_batch_item_access',
    'access arguments' => array(2),
    'file' => 'includes/menu.inc',
    );

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally. I agree with you @adam-vessey . Those changes are the ones i was speaking about to make this pull not only something that does not break existing functionality and allows other cases, but be fully robust on those future cases. If you agree with the general structure of that gist, i can take that and move forward with changing every place there the assumption is a PID is the only key to rule them all to PID, SET based display and functionality. Will see what my agenda allows me this week, but should be pretty trivial, have already annotated all my findings + yours = feel all based would be covered. Thanks!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, coming back to this, sorry for the long wait. Will have an update this week.

@bondjimbond
Copy link

Any development on this, @DiegoPino? Code freeze is coming!

@DiegoPino
Copy link
Author

DiegoPino commented Oct 16, 2018 via email

@bryjbrown
Copy link
Member

@DiegoPino Code freeze is this Friday, Oct. 19.

@DiegoPino
Copy link
Author

DiegoPino commented Oct 16, 2018 via email

@bondjimbond
Copy link

@DiegoPino Looks like this has been at rest for quite a while, but still necessary for IMI, correct? I would love to pull this in, as I'd like to start using IMI's update feature.

It looks like you have answers to all of the review comments, just need to bring them into this PR?

@DiegoPino
Copy link
Author

@bondjimbond true. Also true, sick day today. Adding this to my Friday afternoon agenda (searching for an empty sport...seek, seek). Give me 24hrs, if i don't come back please please ping me. I do have totally too much on my hands but i agree this needs to go into core someday. Thanks a lot

@bondjimbond
Copy link

Hi @DiegoPino! Pinging you as requested.

@DiegoPino
Copy link
Author

@bondjimbond you are the best! Thank you! Doing some code adjustments and some testing since Friday. Will let you know when its ready for a re-review, should be circa tomorrow afternoon. =)

@bondjimbond
Copy link

Huzzah! Thanks. This will be a very useful addition. One of my sites is itching to do a batch update on their objects with a CSV!

@bondjimbond
Copy link

@DiegoPino Prodding again just to check how it's going.

@DiegoPino
Copy link
Author

DiegoPino commented May 15, 2020 via email

@bondjimbond
Copy link

Any luck with this?

@DiegoPino
Copy link
Author

Still need to rebase but there is also an hook i need to implement to make sure the DB gets the significant changes here. Pretty sure it will take me a few more days. Sorry

@bondjimbond
Copy link

Hi @DiegoPino -- just a reminder that this still needs to be rebased and reviewed before IMI is fully functional. Any progress since last we spoke?

@bondjimbond
Copy link

@DiegoPino Looks like the conflict is just changing your update number to 06 instead of 05 (which exists on the main branch already)

@DiegoPino
Copy link
Author

@bondjimbond but my branch still does not address all the concerns. Its not really just the conflict. I mean i can fix the conflict using the web if that is all

@DiegoPino
Copy link
Author

There. Conflict is solved. Still not sure this is ready

@bondjimbond
Copy link

there is also an hook i need to implement to make sure the DB gets the significant changes here

Was this the only remaining issue, or are there others you know of?

@adam-vessey
Copy link

@bondjimbond : The stuff here?: #105 (comment)

@DiegoPino
Copy link
Author

@bryjbrown exactly what @adam-vessey says

@DiegoPino
Copy link
Author

@bondjimbond i would love to make more time for this, but pretty sure i can not. And IMI needs a lot of new stuff too. Maybe anyone else has the bandwidth to make a pull against my branch?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants